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Let (Xi)i=i,..., n be a possibly nonstationary sequence such that 
Jjf (Xi) = P„ if i < n6 and S£{X£) = Q n if i > n6, where < 9 < 1 is 
the location of the change-point to be estimated. We construct a class 
of estimators based on the empirical measures and a seminorm on the 
space of measures defined through a family of functions T. We prove 
the consistency of the estimator and give rates of convergence under 
very general conditions. In particular, the 1/n rate is achieved for 
a wide class of processes including long-range dependent sequences 
and even nonstationary ones. The approach unifies, generalizes and 
improves on the existing results for both parametric and nonpara- 
metric change-point estimation, applied to independent, short-range 
dependent and as well long-range dependent sequences. 

1. Introduction. The change-point problem, in which one must detect 
a change in the marginal distribution of a random sequence, is important in 
a wide range of applications and has therefore become a classical problem 
in statistics. A comprehensive review of the subject can be found in [5]. In 
this paper we consider the general case of nonparametric estimation that 
must be used when no a priori information regarding the marginal distri- 
butions before and after the change-point is known. Although this problem 
has been widely studied for independent sequences, studying dependent se- 
quences has importance for both theoretical reasons and numerous practical 
applications. In this paper we consider this challenging problem and de- 
velop a unified framework in which we can deal with sequences with quite 
general dependence structures. We prove that the rate of convergence of a 
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broad family of nonparametric estimators is O p (n~ 1 ). This is a particularly 
surprising result because the dependence structure of the sequence plays ab- 
solutely no role in determining the rate of convergence. The rate O v {n~ l ) is 
clearly optimal because there are only n points in the sequence. 

For independent sequences there is a wide literature, and both parametric 
and nonparametric methods have been widely studied. The nonparametric 
problem was considered by Carlstein [4], who proposed an estimator, proved 
its consistency and determined a rate of convergence. Diimbgen [6] embedded 
the estimator proposed by Carlstein in a more general framework, improved 
the rate of convergence in probability and derived the limiting distributions 
for certain models. Ferger [7] considered the almost sure convergence for 
Diimbgen's estimators. Yao, Huang and Davis [15] considered the case in 
which the location of the change-point can tend to either or 1 as the 
sequence length tends to infinity. Ferger [8, 9] has investigated a number of 
features of change-point estimators including probability bounds and rates of 
weak and almost sure convergence. Since then several works have generalized 
these results to a weakly dependent or short-range dependent setting. 

In recent years the importance of long-memory or long-range dependent 
(LRD) processes has been realized in a wide range of applications, especially 
in the analysis of financial and telecommunication data. For the purposes of 
this paper we define real sequences (Xi)i = i r __ )n to be short-range dependent 
(SRD) if limsup^^n" 1 E[JXi(Xi - ELY-])] 2 < oo and LRD otherwise. 
Several works are concerned with the generalization of the results for inde- 
pendent sequences to a SRD setting. However, estimating change-points for 
LRD sequences poses a number of significant challenges and there are much 
fewer known results in this case. 

Parametric change-point estimation for LRD sequences, in which one typ- 
ically has a priori knowledge about the marginal distributions, has been con- 
sidered by a number of authors. Kokoszka and Leipus [12] considered the 
change in the mean for dependent observations for LRD sequences. They ob- 
tained rates in probability for the cumulative sum (CUSUM) change-point 
estimator and gave a rate of convergence of the estimator that gets worse 
as the strength of the dependence increases. The problem with a jump in 
the mean that tends to zero was considered by Horvath and Kokoszka [11]. 
They proved the consistency of the estimator and gave the limiting distri- 
bution. For sequences that have a change in the mean, Ben Hariz and Wylie 
[2] showed that the rate of convergence does not get worse as the strength 
of the dependence increases and that the rate of convergence for indepen- 
dent sequences is also achieved for both SRD and LRD sequences. In the 
nonparametric setting Giraitis, Leipus and Surgailis [10] derived a number 
of results that focused mainly on hypothesis testing. However, to our knowl- 
edge, there are no results regarding rates of convergence of nonparametric 
change-point estimation for LRD sequences. 
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In this paper we adopt a very general framework that allows us to consider 
a very general class of dependence structures. In particular, we make no as- 
sumption about stationarity in the dependence structure. This is especially 
important in practice because one can confidently make use of the proposed 
estimators on a sequence without checking for such stationarity (which is 
typically extremely difficult in practice). This framework represents a uni- 
fied setting in which independent, SRD and LRD sequences can be treated. 
We prove the consistency of a Dumbgen-type estimator and show that the 
Op(n~ l ) rate of convergence for independent sequences is also achieved for 
both SRD and LRD sequences. In addition, we consider the case in which 
the difference between the distributions before and after the change-point 
tends to zero. 



2. Main results. Let {Xi)i=\ n be a sequence in a measurable space E. 
The marginal distribution (which may depend on the sequence length n) is 
given by 

c / otv\-[ p n, iii<n6, 
\Q n , iii>n6, 

where < 9 < 1 is the location of the change-point. This means that we as- 
sume first-order stationarity on either side of the change-point, but make no 
assumption about stationarity in the dependence structure of the sequence. 

Given the sequence (^Q)i=i,...,m we aim to estimate the location of the 
change-point 9 using an estimator of the general type 

(2.1) e n = -minfargmax{JV(D Jfe )}Y 

n \ l<k<n J 

where N is a (possibly random) seminorm on the space M. of signed finite 
measures on E, 

m M;H)n^.-^£ +i 4 

and 7 is a parameter satisfying < 7 < 1. The estimator proposed in [6] 
corresponds to the case of 7 = 1/2. 

Estimators of this type consider all possible locations of the change-point, 
k. For each possible k they compute the difference between the empiri- 
cal probability distributions for the data points on either side of the pro- 
posed change-point. This difference is then multiplied by the weighting factor 
[k/n(l — /c/n)] 1 " 7 . We then require a seminorm, N, to measure the difference 
between the empirical probability distributions. The estimator 9 n is chosen 
to maximize the difference between the empirical probability distributions 
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under the given seminorm. The weighting factor is required, otherwise val- 
ues of k near the end points give rise to empirical distributions that contain 
few data points and therefore give very large statistical errors. 

In the theorems stated below we will develop a framework that can deal 
with a very general class of estimators. Different seminorms represent using 
different measures of the difference between the distributions before and 
after the change-point. In the following, we give some examples of seminorms 
that have been used to estimate change-points for independent data. We will 
show that these estimators, and a much wider class, are also appropriate for 
estimating change-points in dependent data. For a measure v on E and 
/ : E — > R, we define v(f) as 

(2.3) u(f) = J f(x)u(dx). 

For each choice of seminorm, we require a family of functions that we denote 
by T . For example, for parametric estimators that only consider a single 
moment, T will only contain a single function. 

Example 1. For a family of functions T = {t^x^i = 1, • • • ,n}, we de- 
fine norms of a measure v via the quantities d{ = ^(l.<xj- This corresponds 
to the setting of [4]. For example, 

(2.4) N(v)= sup \di\ 

l<i<n 

corresponds to the L°° or Kolmogorov-Smirnov norm and 

(2.5) Np {v)=[-Y J \d i 

\ n i=i 

corresponds to the L p norm. The cases p = 1 and p = 2, correspond to the 
most commonly used L 1 and I? norms. Observe that in this example the 
family is random and therefore the seminorm is also random. 

Example 2. For T ={f p ■ x — > x p ,p = 1, . . . , +00} we define the semi- 
norm by 

N(u) = ^d(f)W(f)\, 

where d(f) is a sequence of positive weights. This includes the parametric 
estimators in which we estimate a change in some moments. For example, 
differences in the pth moment can be detected using the seminorm that 
applies the measure (2.2) to the function f p :x — > x p . This framework can 
also deal with a weighted sum of all moments. This family requires high 
moments of the marginal law to be finite. To overcome this restriction, one 
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can consider truncated moments, that is, a family given by J- ={f^j '■ x — > 
xP ^\x\<MiP = !)•••) where M is a constant which can be arbitrarily 

large. 

Example 3. J- ={tu,D € T>}, where P is a family of sets which satis- 
fies certain conditions, such as the family being a VC subclass (see [6]). This 
means that the family of sets has a covering number which grows polyno- 
mially (see [14]). 

We now turn our attention to the dependence structure of the sequence. 
We note that for any given norm, one must apply the measure (2.2) to a 
family of functions. In this paper we will consider a very general class of 
dependence structures. For a given sequence we will allow the estimator to 
use families of functions that satisfy the following condition. 

Assumption 1. There exist constants C > and p > that are inde- 
pendent of the sequence length such that 

(2.6) sup sup |corr(/(Xi),/(X i+m ))| < Cm~ p . 

f£Fl<i<n-m 

This assumption simply states that for each of the functions / in J- the 
correlation between f(Xi) and f(Xi +m ) must decay algebraically or faster 
with m as m — > oo. This assumption is satisfied for a very general class of 
data. We now give some examples for which Assumption 1 is satisfied. 

Example 4. Let G\ and Gi be any measurable functions and (Z{) be a 
(possibly nonstationary) Gaussian sequence such that sup 1<i<n _ m | corr(Z,,, 
Zi+m) \ < Cm~P and X* = Gi(Zi) if t < nfl and X { = G 2 (Zj 41 i > n6. Then 
for any family T such that E(/ 2 (Xj)) < oo for /SJF, 

(2.7) sup sup |corr(/(Xi),/(Xi+ m ))| < CW 

/GJF l<i<n— m 

(see, e.g., [1]). In fact, this example can be extended to functions of Gaussian 
vectors using the results of [1] . 

Example 5. Let (X,) be defined by X< = zf ] = YX=-oo ^M-fc if 
i < n6 and X { = zf> = Et=-oo ^fM-fc if * > nd > where (&* ) and i^) 
are real sequences and (ej^ ) and (e£ ) are random stationary sequences 
with zero mean and finite variance. If J2tbL-oo l^i ^( e o e fe-i)l < 00 f° r 
i,j = 1,2, then (z!f ') exists almost surely and K((Z^) 2 ) < oo. Let r(k) = 
supj J=12 |E(eQ i ejk )|. If we assume that sup fc | | < Cm~ a , for a > 0, 
then |cov(Xj,Xj_|_ m )| < C'm~ a . This example includes FARIMA processes 
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with correlated innovations such as GARCH processes. It allows us to model 
long-range dependence and time-dependent conditional variance. These two 
features are frequently encountered in financial time series. So, Assumption 

1 is satisfied when T is the set of the identity function. 

In Theorem 1 we develop conditions that can deal with countable families 
of functions and norms that are bounded by weighted moments. In Theorem 

2 we consider the case of uncountable families. In this case we need to control 
the size of the family. This will be done by using covering numbers defined 
in Assumption 2. 

We begin by considering the case where the class of functions T is count- 
able and the difference between the distributions before and after the change- 
point may tend to zero as the sequence length, n, tends to infinity. This 
theorem essentially handles the case in which the norm is bounded by a sum 
of weighted moments and hence includes most commonly used parametric 
estimators. 

For / in T we set 

(2.8) ll/H = sup(P„(/ 2 ) + Q n (f)) 1/2 = S up(Ep,J/ 2 ] + E Qn [/ 2 ])V2. 

nGN ngN 

Theorem 1. Assume that the norm N satisfies 
(2-9) N(u)<J2d(f)Hf)\, 

where T is a countable family of functions satisfying (2.6) and d(f) are 
positive constants such that YlfeT^if) 11/11 < 00 • ^ e assume that there exists 
a positive sequence b n such that 

(2.10) P[N(P n - Q n ) > b n ] -► 1 asn^oo. 

Let p = min(l — e,p) for any e > 0, where p is given in (2.6). If 

(2.11) 6,; 1 [^ /2 (l + ln(n)l 7 _i+p/2=o) + ^ 7 " 1 ]^0 aan ->oo, 
then we have 

(2.12) § n -9 = O p (n- 1 b- 2/P )- 

We note that the largest possible value of p is strictly less than unity and 
so as long as 7 < 1/2 we will always have 7 — 1 + p/2 7^ 0, in which case 
we obtain a less restrictive condition than [6] on the speed at which the 
difference between the distributions before and after the jump tends to zero. 
Moreover, if the sequence is LRD (p < 1), then we have more freedom in the 
choice of 7, namely 7 < 1 — p/2. 

This theorem takes a simpler form when N(P n — Q n ) is bounded away 
from zero. This is stated in following corollary. 
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Corollary 1. Under Assumption 1, assume that the seminorm N sat- 
isfies (2.9) and (2.10) withb n >b>0. Then 

(2.13) e n -e = o p (n- 1 ). 

Corollary 1 includes the commonly encountered case in which the distri- 
butions P n and Q n do not depend on the sequence length and the seminorm 
is nonrandom. 

Equation (2.10) controls the rate at which the seminorm of the difference 
between the two distributions decays to zero by stating that it decays more 
slowly than some sequence b n . In particular, if the seminorm is nonrandom, 
one can take b n = 2~ 1 N(P n - Q n ). Equation (2.11) requires that random 
fluctuations arising from sums of the type (2.2), which have size 
n 7 " 1 ), decay to zero faster than the sequence b n and consequently decay 
faster than the distance between the two distributions. This is a natural 
condition to be able to detect a change-point. 

We now turn our attention to the case when the family T contains an 
uncountable infinity of functions. The following theorem deals with an ex- 
tremely general set of norms including all of those considered by Carlstein [4]. 
In this case, under the assumptions that the family has a finite covering num- 
ber, we obtain the same rate of convergence as in (2.13) when P n and Q n 
are independent of n. For the case in which the size of the difference between 
P n and Q n tends to zero as n — > oo we obtain a rate that depends on the 
covering number that will typically represent some loss on (2.12). 

Assumption 2. Given two functions I and u, the bracket [l,u] is the set 
of all functions / with I < f <u. Given a norm || • || on a space containing 
T, an e-bracket for || • || is a bracket [l,u] with \\l — u\\ < e. The bracketing 
number N^(e,J~, \\ ■ ||) is the minimal number of e-brackets needed to cover 
T. 

A family T is said to satisfy Assumption 2 if 

(2.14) Ve>0 NtffaF, || • \\x) < oo, 

where |[ • \\ x is a norm satisfying sup neN |P n (|/|)| + |Qn(|/|)| < \\f\\x- 

We refer the reader to the monograph of van der Vaart and Wellner [14] 
for examples about bracketing numbers. 

The following theorem considers the case when the difference between the 
distributions before and after the change-point may tend to zero. 

Theorem 2. Assume that the seminorm satisfies 



(2.15) 



N(v)< S up{\v(f)\,fef}, 
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where J- is a family of functions that satisfies sup{||/||,/ £ J 7 } < oo and 
Assumptions 1 and 2. Let p = min(l — e,p) for any e > 0, where p is given 
in (2.6), and e n be any positive sequence that tends to zero as n — > oo. We 
assume that there exists a positive sequence b n such that 

(2.16) P(iV(P n - Q n ) > b n ) -> 1 asn^oo 
and 

b^N^bnSn,^, || • ||x)[n^ /2 (l + ln(n)l 7 _i+p/2=o) + n 7 " 1 ] -> 0. 
Then we have 

(2.17) 6 n - 9 = O p (n~ l [b~ l N V] (b n e n ,F, || • \\x)f P )- 

The following corollary considers the case in which the norm between the 
distributions before and after the change-point is strictly positive. Provided 
that the bracketing number is finite, the n" 1 convergence rate is achieved 
for any norm within a class of functions satisfying Assumptions 1 and 2. 

Corollary 2. Under Assumptions 1 and 2, assume that the seminorm 
satisfies (2.10) with b n >b> and (2.15). Then (2.13) is satisfied. 

Remark 1. In the case b n > b > 0, Theorems 1 and 2 both give the 
same O p (n~ l ) rate for both p < 1 and p > 1. For Theorem 1, in the case 
b n — > with p > 1, it is possible to obtain the rate O p (n~ 1 b~ 2 ln 2 (n6^)) which 
can represent a marginally better result. A similar result can be obtained 
for Theorem 2 with b n — > and p > 1. These results can be obtained by 
modifying Lemma 1 of our proof using Theorem 3 in [13] . 

Remark 2. Assumption 1 can be replaced by the following more gen- 
eral, but less intuitive, condition: there exist constants C > and p > 0, such 
that for any m 

(k+m \ 2 

i=k ) 

In this case ||/|| can be replaced by unity in the assertions of Theorems 1 and 
2. Observe that this assumption is particularly weak and satisfied by a large 
class of processes and families of functions. We now present more examples 
of commonly used time series models and families of functions that satisfy 

(2.18) and Assumption 2. 

Example 6. We begin by considering a linear process with a family of 
functions that satisfies a Lipschitz condition. Let J- be a family of uniformly 
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bounded functions such that supj^-p \f(x) — f(y)\ < C\\x — y\ Vl for some 
7]i > and C\ > 0. Then according to [14] T satisfies Assumption 2 for any 
LP norm. We now show that if the sequence is drawn from Example 5, then 

(2.18) is satisfied under additional weak conditions. Let X\ = J2^ <v &i M-fc 

with j = 1 if i < nO and j = 2 if i > n9. Assume that {el 1 are 1' 

dependent and 

(2.19) 3? ?2 >0 Vf E[Xi-X"} 2 <C 2 v~ m . 

For example, if \b^\ + |6^ 2) | < Clk^f 3 and (3 > 1/2, then one can readily 
show that (2.19) is satisfied. The sequence X? is 3v -dependent for v > q, 
and so by using a blocking technique we have E(X^*/(^i)) 2 < Cmu, 
where /(X) = /(X) - E[/(X)]. Letting v = m l ^ l+ ^\ we obtain 

'k+m, \ 2 /k+m \ 2 /k+m \ 2 

rf(m) > 



')) 



(fe+m \ z /fc+rn \ ^ /fe+m 

£/(*,) <2E E/(i; W ) +2E E(/(X 8 )-M 
i=fc / V i=fc / V i=fc 

So (2.18) is also satisfied and hence Theorem 2 applies. 

Example 7. In this example we consider a linear process given in Ex- 
ample 5 with a family composed of indicator functions, namely T = {fx{~) = 
l.<x, x € M}. This family is relevant to the commonly used L p and L°° norms 

in Example 1 for which Assumption 2 is satisfied. We assume that (ei 1 ^ , eL ) 

are g-dependent and |6^| + \b^ \ < C\k\~@ with > 1/2. We begin by as- 
suming that q = l. Then we have 

/k+m \ 2 /fc+m \ 2 

E <2E £ /*(*?) +2m 2 supE[/ a: (A J )-/ a; (^)] 2 , 

where f x (X) = f x (X) — K[f x (X)]. Again, using the blocking technique, we 
have E(^£+™ f x {X^)) 2 < Cmv. One can also show that for some iji > 0, 
sup x sup i E[/ a! (.X i ) - f x {Xl')] 2 < Cv-^. Then by choosing v ~ m 1 /^ 1 ) we 
obtain E(X^+™ f x {Xi)) 2 < Cm^w/^+n). The case of g > 1 can be handled 
by dividing the sum 2^f=fc" fx(Xi) into q blocks such that within each block 
the innovations are independent. Hence (2.18) is satisfied and Theorem 2 
applies. 

Before presenting the proofs, we give an intuitive explanation of why the 
rate of convergence of the estimator does not depend on the dependence 
structure of the sequence. We define t^ = kjn. Then = D n {tk), where 




1 



ra(l-t) 
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and w(t) = t 7 (l — t) 7 . We rewrite D n {t) as the sum of its mean and a centered 
random component, B n (t), 

(2.20) D n (t) = -4aK p » " Qn)g(t) + B n {t)}, 

where g(t) = t(l — 9 n )tt<e„ + S n (l — t)lt>8„ * s a piecewise linear function 
that takes its maximum at the point 8 n = [n6]/n and B n is the empirical 
bridge measure given by 

(2.21) B n (t) = W n (t)-tW n (l), 

[nt] 

(2.22) W n {t) = -Y J [8x i -^{X i )]. 

lb. ~ 
1 = 1 

Our main results stated in Theorems 1 and 2 occur because of the can- 
cellation of two competing effects. One of the effects is concerned with the 
absolute magnitude of the random noise in D n (t). The mean component of 
D n (t) is monotonically increasing for t < 6 n and monotonically decreasing 
for t > 8 n and therefore takes its maximum at t = 6 n . The estimator is chosen 
by maximizing N(D n (t)), so if the noise is sufficiently small we would expect 
to obtain a good estimate. For independent or SRD sequences the partial 
sums in the centered random component of D n (t), namely B n (t), typically 
have a magnitude of order n _1//2 as rn oo. As shown by Diimbgen, this 
gives rise to typical errors of order in the estimator. For LRD sequences 
the partial sums decay more slowly. This means that the stronger the de- 
pendence the larger the random component in (2.2). This effect makes the 
estimation more difficult. One might naively expect that this would mean 
that LRD sequences have a slower rate of convergence than SRD or inde- 
pendent sequences. However, there is another effect that is concerned with 
the variations in the noise in the vicinity of the change-point. Correlations 
in LRD sequences imply that the random noise B n (t) becomes correlated. 
This means that the random noise has less rapid variation and local fluctua- 
tions become smaller. Estimation requires one to find the global maximum of 
N(D n (t)) and this depends critically on the local variations in the vicinity of 
the change-point rather than on the absolute magnitude of the noise. Hence 
the smaller the local fluctuations are, the easier the estimation becomes. 
These two effects exactly compensate and give the surprising feature that 
the overall rate of convergence is the same for all dependence structures. 

3. Simulations. In this section we present the results of numerical simu- 
lations that investigate some of the important practical features of change- 
point estimation. We confirm that the rate of convergence is O p (n~ 1 ) for 
LRD, SRD and independent sequences. We also determine how large the 
sequence length needs to be before the O p (ra _1 ) rate is observed. 
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We considered the estimation of the change-point for a sequence that is 
a function of a dependent Gaussian variable, (Yi)i=i n with zero mean 
and unit variance. We generated a sequence with a change in the marginal 
distribution by taking 

Y _ J y? - 1, if » < n9, 
1 \l-lf, iii>n6. 




5 
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The sequence (Xi) has the property that the marginal distributions before 
and after the jump have the same mean and variance, but have different 
skewness. We generated the Gaussian sequences (Yl) with a covariance given 
by r(n) = (1 + n 2 )^~ a /^ ~ n~ a l 2 using the Durbin-Levinson algorithm (see, 
e.g., [3]). The sequence (JQ) satisfies Assumption 1 with p = a. We note that 
the Durbin-Levinson algorithm has complexity 0(n 2 ) and so generating long 
sequences can be quite computationally expensive. 

We show results for the estimator that uses the Kolmogorov-Smirnov 
norm (KS) (2.4) and the L 1 norm defined in (2.5) with p = 1. The parameter 
7 is equal to 0.5. We note, however, that taking different norms, such as 
p = 2 in equation (2.5), yields qualitatively similar results. We considered 
independent sequences, SRD sequences with a = 1.5 and LRD sequences 
with values of a = 1.0,0.8,0.6 and 0.4. We present simulations in which the 
sequence length, n, varies between 1000 and 7000. The mean absolute error 
MAE = E(|(# n — 6)\) for each value of a was estimated using 10,000 different 
sequences. In Figure 1 we plot n(MAE) against n with 95% confidence 
intervals. S ince n — — Op{ji ^), we anticipate that ti(MAE^) should tend 
to a constant as n tends to infinity. This is clearly seen in Figure 1 for 
independent, SRD and LRD sequences. As the range of dependence becomes 
longer, the value of n required to obtain the O p {n~ 1 ) scaling becomes larger. 
This is because the leading order correction to the O p (n~ 1 ) rate contains 
partial sums that are a factor n~ a l 2 smaller than the leading term. So for 
small a, large values of n are required for the leading order term to dominate 
the corrections. 

4. Proofs. We will begin by proving that the estimators are consistent. 
For Theorem 1 this is straightforward, but for Theorem 2 we require a pro- 
jection argument to deal with the uncountable size of the family T . Having 
proved consistency, we then turn our attention to the rates proofs. The rates 
proofs follow a similar pattern to the consistency proof and the techniques 
used are similar. In the proofs, C,C±,C2, ■ ■ ■ denote generic constants that 
are independent of n for n large enough whose values may differ in different 
equations. In general, 6 ^ {k/n:k = 1, . . . , n} so we have defined 6 n = [n9]/n. 
To prove Proposition 1 below and Theorems 1 and 2 it suffices to prove the 
assertions with replaced by 9 n . In all of the proofs, we will assume p < 1 
since the proofs can be easily adapted for the case p > 1 by replacing p with 
P- 

We require the following lemmas for the proofs. The first one is a maximal 
inequality which is a special case of Theorem 1 in [13]. 
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Lemma 1. Assume (2.6) with p < 1. Then there exists a constant D{p) > 
such that 



(4.1) 



E max 
\ Kk<n 



i=l 



<^(p)||/||V-^. 



The second lemma controls the size of the empirical bridge and is a simple 
consequence of (4.1). 

Lemma 2. Assume (2.6) holds with < p < 1. TTien i/iere exists a con- 
stant D(p) > swc/i t/ia£ /or any < k < 1 



(4.2) E 
and 



SUp KWnW-Wn^n))^)! 
|t-0nl<K 



<L>( j0 )||/||n-<>/V-<'/ 2 



(4.3) E 



sup |(W n (t))(/)| + sup \(B n (t))(f)\ 

\t\<K \t\<K 



<D(p)\\f\\(n-Pl 2 K l -Pl 2 ) 



The third lemma controls the size of oscillations of the weighted empirical 
bridge which we define as 

B^{t) = w-\t)B n (t). 

Lemma 3. Assume (2.6) with p<l. Then there exist constants C(6,rj) 
and D{p) such that for k < rj, 

(4.4) e( sup |(S-(t)-S-(^))(/)|)<C(0,7 7 )£»(^||/||n-"/\ 1 -''/ 2 . 
\|t-e„l<« / 

Proof. Using Taylor's theorem to expand u; _1 (i) near t = 8 n , we obtain 
BZ(t)-BZ(6 n ) 

= w- 1 (6 n )(w n (t)-w n (e n )) 
- (t - e n )[w-\e n )w n {i) + (w- 2 (t) w '(0)(w n (t) - tw n (i))], 

where £ £ (t, 9 n ). Therefore, for rj small enough and \t — 9 n \ <r], there exists 
a constant C(6,n) such that 



(4.5) 



|(2£(*)-2W»))(/)| 

<w-\6 n )\(W n (t)-W n (6 n ))(f)\ + C(9,Tj)\t-0 n \ sup \W n (t){f)\. 

0<t<l 
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Hence it suffices to control the size of the oscillations of W n (t). By (4.2) and 
(4.3) of Lemma 2, we have 

e( sup \{K(t)-KV»))(f)\) 

\\t-O n \<K J 

<vT\Q n )^{ sup \(W n (t)-W n {9 n )){f)\] 

\\t-6 n \<K J 

+ C(0,ti)ke( sup \W n (t)(f)\) 

\0<i<l / 
<w-\e n )D{p)\\f\\n-Pl 2 K l -Pl 2 + D{p)\\f\\C{e,i 1 )Kn-<>l 2 
<C{6^)D{p)\\f\\n-Pl 2 K l -P/ 2 , 

where C(0,r]) may change in each occurrence, and the relation (4.4) follows. 
□ 

4.1. Consistency proofs. We first recall some notation and introduce 
some additionally. Let 5 n = P n — Q n , 

h(t) = w-\t)(t(i - e n )i t < Bn + e n {\ - t)i t>e j 

and B%(t) = w' 1 (t)B n (t) , where B n (t) is defined in (2.21) and w(t) = t 7 (l - 
t) 7 . For t in G n = {k/n, 1 < k < n} we rewrite D n (t) defined in (2.20) as 

D n (t) = B»(t) + h(t)6 n . 

We also recall that 6 n is a maximum of {N(D n (t)),t € G n }. The following 
proposition states the consistency of the estimators. 

Proposition 1. Let X be a sequence and T a family such that (2.6) 
is satisfied. Assume that the conditions of Theorem 1 or Theorem 2 are 
satisfied. Then 

V??>0 ¥(\§ n -e n \ > ?7)->0 asn^oo. 
Proof of Theorem 1. By definition § n is a maximum of N(D n (t)). 

So 

(4.6) N(D n (e n ))>N(D n (9 n )). 
Using (2.20), we obtain 

N{B™{0 n ) + 6 n h(0 n )) > N(B™(9 n ) + 5 n h{6 n )). 
Repeated use of the triangle inequality yields 

N{B™{9 n )) > N(B%(0 n ) + 5 n h(9 n )) - N(5 n h(9 n )) 

> N(5 n h(9 n )) - N(8 n h(9 n )) - N(B™(9 n )). 
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Hence, 
(4.7) 

We define a r , 



N(B™(6 n )) + N{B™{e n )) > N(5 n )(h(9 n ) - h(6 n )). 



inf 



\t-6 n \>r) 



{h(6 n ) — h(t)}. Then a n > a > for n large 
enough, because h is monotonically increasing for t <0 n and monotonically 
decreasing for t > 9 n . Since a n is defined to be an infimum we obtain 

l -6 n \> rj\ 

= F[N(B™(8 n )) + N{B™(Q n )) > aN(S n ), \§ n -6 n \>r,] 

< F[N(B%(0 n )) + N(B™(O n )) > ab n , \§ n - 6 n \ > rj\ 
+ P[N(6 n )<b n ]. 



(4.8) 



We use the fact that F[X + Y > e, B] < F[\X\ > e/2, B] +P[|Y | > e/2, B], for 
any random variables X and Y, set B and e > 0, to obtain 



h\>v\< 



N(B™0 n ))>^,\e n -e n \> v 



(4.9) 



+ 



N(B'n(9 n )) > ^ 



+ P[iV(5n)<fen] 



= Ai + A 2 + A 3 . 



We begin by controlling A\. We will assume that r\ is sufficiently small 
such that 8 n — r] > and 1 — 9 n — r\ > 0, since other cases can be dealt 
with similarly. For the sake of brevity we introduce the notation (3 m i n = 
min(# n — r], 1 — 9 n — rf) and f3 m ax = max(# n — 77, 1 — 6 n — 77). We introduce 
sets S±,. .. ,Sj given by 

Sj = {*:2--> ^(fln-r/)- 1 <2^' +1 }U{t:2-J < (1 — — — 77)- 1 < 2~i +l }. 



The integer J is chosen so that n 2 J < /3 max < ft 2 . As j increases 
these sets become increasingly close to the end points of the domain and J 
is chosen to be large enough so that the smallest and largest possible values 
of the change point (i.e., 6 n = l/n and 9 n = 1 — 1/n) are included in one of 
the sets. Then 



(4.10) 

where 
(4.11) 



A 1 



j 

3=1 



?„efyJV(5S»0 B ))>^i 



3=1 



Ai(nJ) < 



ab n 



sup N(B n (t)) > — inf w(t) 



teSi 



2 teSj 



16 S. BEN HARIZ, J. J. WYLIE AND Q. ZHANG 

A simple calculation shows that inf^^ w(t) = mm(w((9 n — r/)2^- J ), 
w((l — 9 n — 7/)2 _J ) > ^ in 2~ 37 ~ . Hence applying the Markov inequality to 
(4.11) we obtain 



sup N{B n (t)) 



In order to control E[sup tg5 . N(B n (t))] we need to control E[sup te5 . \B n (t)(f)\] 
for / £ J-. We use (2.9) to prove Proposition 1 under the conditions of The- 
orem 1 and use a chaining argument for Proposition 1 under the conditions 
of Theorem 2. The control of E[sup ie5 . \B n (t)(f)\] is formulated in Lemma 
2. Using (2.9) and applying Lemma 2, we obtain 

ii(n,i) < Q^2 a -i 6 -i E r sup d(f)\B n (t)(f)\ 

< Pj^a-X 1 £ d(f)D{p)\\f\\n~^((3 m ^f-"l\ 
Substituting the above inequality into (4.10), we obtain 
(4.12) A 1 < SP^J^a-X'Dip^ £ d(/)||/|| £ 2^ 1 +^'. 

It is easy to show that 
J 

3=1 

(4.13) 

< C(p, 7 )(1 + n^ 1+ ^ 2 Vi+p/2^o + lnnl 7 _ 1+p/2=0 ). 
Substituting (4.13) into (4.12) and relabeling the constant yields 

(4.14) A 1 <C 1 6- 1 (n^/ 2 (l + lnnVi+p/2=o) + n 7 - 1 ) £ 

To control A 2 we make similar use of the Lemma 2 to obtain 

(4.15) A 2 <C 2 D(p)b- 1 n-^Y,d(f)\\f\\. 

Finally from (4.9), (4.14) and (4.15) we deduce 
9 n -0 n \ >rj\ 

<Cb- 1 Y / d(f)D(p)\\f\\(n^-\l + lnnt 1 _ 1+p/2=0 )+n-p/ 2 ) 
+ HN{S n )<b n ). 
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Taking the limit n — > oo under the conditions (2.10) and (2.11) and the 
condition J2f^d(f)\\f\\ < oo completes the proof. □ 

Proof of Theorem 2. The consistency proof under the assumptions of 
Theorem 2 is identical to that of Theorem 1 up until (4.9). Then we proceed 
by using a projection argument to bound A±,A2, and A3. This projection 
argument is to deal with the uncountable family of functions. Since N(K) = 
N^(2~ K , J 7 , || • ||x) is finite for any integer K, there exists a finite sequence 
of pairs of functions (//*", Af) 1<i<N ^ , such that V/ G T there exists i such 
that |/ - ff\ < Af , and ||Af ||x < 2~ K . For each K we define a map M 
from T to T x T by M(f) = (ffa, A? {f) ) = (n K (f),A K (f)), where i(f) = 
inf{l < i < N{K)\f« - Af < / < /f + Af }. 

We assume that 7 — 1 + p/2 7^ (the case 7 — 1 + p/2 = can be handled 
similarly and is hence omitted). We apply the Markov inequality to A\ in 
equation (4.9) and then use the assumption (2.15) on the seminorm N to 
obtain 

^<2a- 1 6- 1 E(sup| J B-(0 n )(/)|l | ^_ en|>r) 

To control A\ we will consider two cases: n > n and 9 n < 6 n , hence 
M < 2«- 1 6,- 1 (e(sup \B™(9 n )(f)\l Q<§n<9n _ v 



+ Efsup|^(^)(/)| V+ ^ <1 
= A[ + A'(. 

We first control A' x . Writing / = / — 7rif (/) + vr^(/) gives 



A[ < 2a-% 1 E(sup\B™(9 n )(f - M/))^^-, 
+ 2a- 1 6- 1 Efsup|^(0 n )(7r i ,(/))|l o< ^ <en _ j; 



(4.16) 

+ 2a~ 1 b~ 1 E 

Using the definitions of B n and W n , we observe that if \cp\ < g, then 



(4.17) \B n (t)(<f>)\ < \W n (t)(g)\ + \tW n (l)(g)\ +4 



tsupEds^)!) 



Using the fact that \f — 7r^(/)| < Ax(f), applying (4.17) to the first term 
in (4.16) and the triangle inequality to the second term in (4.16), we obtain 

A[ < 2a- 1 &- 1 E(sup \w-\6 n )W n {6 n ){A K {f))\t Q<L<gn __ v 
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+ 2a- 1 6- 1 E^up| U ;- 1 (^)^W„(l)(A x (/))|l 0< ^ <en _ r) 
+ 8a~ 1 sup (w" 1 ^)^" 1 sup sup E P (\A K (f)\) 

0<t<e n - v f£FP£{P„,Qn} 

+ 2a- 1 fe- 1 E(^sup| U ;- 1 (^)^ n (^)(7r x (/))|l o<4<0n _ ?? 

+ 2a- 1 ^ 1 E(^sup|^- 1 (^)e n W„(l)(7r^(/))|l 0< ^ <en _ j) 
= ^-1,1 + -^1,2 + ^1,3 + A'i t 4 + ^4i,5- 

Following a similar procedure used in the proof of Proposition 1 under the 
conditions of Theorem 1, we introduce the sets S[, . . . , S' , defined as 

Sj = {t:2- J " <tl3'~ l <2~ j+1 }. 

Without loss of generality we assume j3' = 6 n — r\ > and choose J' to be 
the integer such that n" 1 G Sj,, hence (3'2~ J < n" 1 < (3'2~ J . The proof 
proceeds in a similar way to that of Proposition 1 under the conditions of 
Theorem 1. We control A\ x using Lemma 2 to obtain 

N(K) 

(4.18) A'^KCtb' 1 £ D^WAfWin^+n-^). 

i=l 

Similar use of Lemma 2 on A' l2 yields 

N(K) 

(4.19) A' 1>2 < C^On,^ 1 £ D{p)\\Af \\n-"l 2 . 

i=l 

Similar bounds hold for A'i 4 and A' 15 . Combining these four bounds with 
the fact that supj 6J? rSupp g {p nj Q n j Ep(|Aft-(/)j) < 2~ K we obtain 

N(K) 

(4.20) A'^C^ 1 J2 £>(p)[||Af|| + ||/f||](^- 1 +n-"/ 2 ) + (7 1 6- 1 2-^ 

i=l 

A similar bound can be derived for X[. Hence, we conclude that 

N(K) 

(4.21) A 1 <C 1 b~ 1 £ 0(p)[||Af|| + ||/f||](n^ 1 + n^/ 2 ) + C 1 6; 1 2-^ ! 

i=l 

where C\ is a constant that depends only on 7, 9 and 77. 
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To control A 2 we write 



A 2 < 2a" 1 6~ 1 E(sup 
V/6.F 



+ 2a~ 1 fe~ 1 E sup 
V/gjf 



w(9 n ) 



w{9 n ) 



(f-KK(f)) 

^ *kU)) 



Applying (4.17) to the first term on the right-hand side of the above equation 
gives 



A 2 < 2a~ 1 fe- 1 E[sup 



W„ 



+ 2a~ 1 b~ 1 E[ sup 



W{Vn) 

nW n (l) 



w(6 



(Ak(/)) 
(Ax(/) 



+ 8e n w- l (9 n )a~% l 2- K + 2o" 1 6- 1 E sup \B™(9 n )(n K {f))\) . 

V/e-F / 

Hence, again by Lemma 2 we have 

N(K) 

(4.22) A 2 < C 2 b~ l J2 D(p)[\\Af\\ + \\}n-Pl 2 + C 2 b~ l 2~ K , 

i=l 

where C 2 is some constant depending only on 7, 9 and 7/. Finally, from (4.9), 
(4.21) and (4.22) we have 

H\0n - 9 n \ > V )< C 3 b- l D(p)N(K) sup [||Af II + ||](^- x + n^ 2 ) 

l<i<N(K) 

+ C 3 b~ 1 2~ K +P(N(5 n )<b n ). 

We choose K such that 2~ K ~ b n e n , where e n is any positive sequence that 
tends to zero. Since b n satisfies 6~ 1 A^[.](6 n e n ,,^ r , || • ||x)(^ 7_1 + n~ p / 2 ) — > 
and ¥{N{5 n ) < b n ) — ► 0, taking the limit as n — > 00 completes the proof. □ 



Remark 3. When b n > b > 0, we choose -fT to be independent of n. 
We let rt tend to infinity and K tend to infinity. This completes the proof 
without posing any restriction on the rate of iVu(e, J-, \\ ■ \\x)- 

4.2. Proof of Theorem 1. Let M be a positive integer, 6 and c be positive 
real numbers and r n be a positive sequence. We first show that for n large 
enough 

(4.23) F(rn\9 n -0 n \> 2 M ) <E 1 + E 2 + E 3 + ¥(\9 n -9 n \> V ), 
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where 

E 1 < ¥[r~ x 2 M < \9 n - 9 n \ < r),N(B%0 n ) - B™(9 n )) > C\9 n - n \], 

(4.24) E 2 = W(N(B%(9 n ))>c), 
E 3 = F(N(5 n )<b), 

C = Ch(bh(8 n ) — 2c) and Ch is a constant depending only on 9 and 7. 

Recall that 5 n = P n — Q n and B™(t) = w~ 1 (t)B n (t), where B n (t) is de- 
fined in (2.21). Then 

(4.25) D n (t) = B%(t)+h(t)6 n . 
For all t we have 

D n (t) = B£(t) - B£(0 n ) + B£(e n ) (l - ^y) + jj^D n (p n ). 

Applying the seminorm and the triangle inequality to the above expression 
yields 

N(D n (t)) < N(B? L {t) - B%(9 n )) J^ N {I%{0 n )) 
h[9 n ) 

Therefore 

N(D n (t)) - N(D n (9 n )) < N(B™(t) - B™(9 n )) 

(4.26) 

+ " l) (N(D n (9 n )) - N(B™(9 n ))). 

Let 9 n be a maximum of {N(D n (t)),t £ G n }, where G n = {k/n, 1 < k < n\. 
Since 9 n is a maximum, we have 

N{B™{9 n ) - B™(9 n )) >(l- \N(D n (9 n )) - N(B»(0 n ))]. 

Applying the triangle inequality to (4.25) gives N(D n (9 n )) > N(5 n h(8 n )) — 
N(B™{9 n )) and therefore we obtain 

N(B™(9 n ) - B™(9 n )) >{l- [N{5 n h(9 n )) - 2N(B™(9 n ))). 

There exists Ch which depends only on 9 and 7 such that for all t £ (0, 1), 
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Therefore we obtain 

(4.27) N(B%0 n ) - B™(9 n )) > C h \9 n - 6 n \(N(8 n h(9 n )) - 2N{B™{9 n ))). 
For any positive integer M and any positive constants b and c, we have 
P(r n \9 n - 9n\ > 2 M ) < P(r~ l 2 M < \9 n - 9 n \ < 77, N(5 n ) > b,N(B™(9 n )) < c) 

+ F(N(B%{O n )) > c) + P(N(6 n ) <b)+ P(\9 n -9 n \>r,) 

= E 1 +E 2 + E 3 + P(\9 n - 9 n \ > n). 

Now from (4.27) we infer that 

E x <P[ r - X 2 M <\9 n -9 n \<T), 

N(B™(9 n ) - B%(0 n )) > C h (bh(9 n ) - 2c)\9 n - 6 n \\. 

This completes the proof of (4.23). 

In order to control E\ we define the shells 

(4.28) S nJ = {t : V < r n \t - 9 n \ < 2^' +1 }, 

where r n is a positive sequence to be chosen later. Let < rj < min(0 n , 1 — 
9 n )/2 and J = J(n,rj) be chosen such that 2 J < r n rj < 2 J+l . From the defi- 
nitions of the shells S n j and J we obtain 

J 

(4.29) E 1 < ]T G S n j,N(B™(O n ) - B™(9 n )) > C\9 n - 9 n \]. 

j=M 

Now, for 9 n 6 S n> j, we have \9 n — 9 n \ > 2 J r~ . Hence using (2.9) and (4.4), 
we get 

(4.30) Ei < C^Cfrr,) £ £ d(f)\\f\\D(p)2^ 1 ^ r P/ 2 n-' } / 2 . 

j=Mfer 

For E2, using (2.9) and Lemma 2, we obtain 

(4.31) E 2 < {cw^n))- 1 £ d{f)\\f\\D{p)n-Pl 2 . 

Now, from (4.23), (4.30) and (4.31) we obtain 
P(r n \9 n -9 n \ >2 M ) 

<(bh(9 n )-2c)- l C(9, V ) £ E d (/)H/l^(P) 2 " 1/2 ^n /2 ^" P/2 

j=Mf£F 

+ (cw(9 n )r 1 D(p)J2d(f)\\f\\n-»/ 2 
+ P(N(6 n )<b)+P(\9 n -9 n \>ri). 
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This inequality holds for any 6, c and r n , so choosing b = b n , c = h(9 n )b n /A 
and r n = nb^/ p and relabeling the constants yields 

nrn\9n-9 n \>2 M )<C(9, V )D(p)Y / £ d(f)\\fP~ 1/2iP 

fefj=M 

+ C(e)D(p)J2d(f)\\f\K 1 n^ 2 
feF 

+ ¥(\e n -e n \> V ) + F(N(5 n )<b n ). 
Finally letting n, then M tend to infinity completes the proof of Theorem 1. 

4.3. Proof of Theorem 2. The proof of Theorem 2 is identical to that 
of Theorem 1 up until (4.23). We proceed by using a projection argument 
to bound E\,E2 and E%. From (2.15) and (4.24) we have 



E, < 



r~ l 2 M < \9 n - 6 n \ < rj, sup\(B™(6 n ) - B™(O n ))(f)\ > C\9 n - 6 n \ 



< C^E 



?«-(9„r4 r -i 2M< |a n _ fln |< JJ sup|(^(0 n )--B^(en))(/)l 



Then from (4.5) we obtain 



(4.32) 



E x < C _1 E 



x sup 



3 n 6 n \ K-i 2 M<\e n -e-n.\<r) 



■\{w n (e n )-w n {e n ))(f)\ 



w(9 n 



sup 



C(6,r,) sup \W n (t)(f)\ 

0<i<l 



Fx + Gi. 



Using the same projection as in the consistency proof we obtain 



Fi < C^E 



n \ K-^ 1 <\e„-e n \< v 



x sup 



1 



w(6 n 



-(W n (9 n ) -W n (9 n ))(f - 7r K (f)) 



+ C~ l E 



t -i 



^t-^m <\e n -e n \< v 

1 



x sup 



w(9 n ) 



(W n (9 n )-W n (6 n ))(ir K (f)) 



CHANGE-POINT ESTIMATION 



23 



We observe that for any </> and g such that \<j>\ < g, 

\(W n (t) - W n {9 n )){4>)\ < \(W n (t) - W n (9 n ))(g)\ 

+ 2{\t-9 n \+n- 1 )supE{g(X i )). 

i 

Since \ f - n K (f)\ < A K (f), by choosing 4> = f - n K (f) and g = A K (f) in 
the above inequality, we have 



F 1 < CHE 



\0 n 0n\ x \-i 2 m <\B n -e n \<7] 

1 



x sup 



w(9 n ) 



{W n {6 n ) -W n {6 n )){A K {f)) 



+ 2C- 1 (l+n-V n 2- M )sup sup E P (\A K (f)\) 

fer Pe{P n ,Q„} 



\6n — &n\ 



X sup 



1 



r^2 M <\e n -e n \<r 1 



■(W n (6 n )-W n (9 n ))(ir K (f)) 



w(9 n 

= F\ t x + Fx t 2 + ^1,3- 

Using the decomposition of {t : r~ l 2 M < \0 n — 9 n \ < 77} over the shells defined 
in (4.28), we obtain 



Fx,! < C- 1 £ E l^-^-^^sup 



j=M 
J 



1 



■(^W-W n (M(A K (/)) 



^C^ 1 E E 



(2V- 1 )- 1 sup 



w{6 n ) 



{W n {9 n ) -W n {B n )){A K {f)) 



N(K) J 
i=l j=M 



sup 



w(9 n ) 



(w n (t)-w n (e n ))(Aj 



K < 



By (4.2) of Lemma 2 we get 

N(K) j 

(4.33) F 1 , l <C- 1 w- 1 (9 n )D(p) E E H A f H ra " P/2 2- (1/2)jP+1 ^ /2 - 

A similar bound holds for Fi t s, and since supf £ - F sup P( z{p n Q n } Ep(|A^-(/)|) < 



2 x we get 

Fi<& 



1 I 1 n 

n 



p/2 N(K) J 

-\e n )D( P ) E E[ll A fll + ll//"ll] 2( " 1/2)jp+1 

i=l j=M 



W 



K \n n -p/ 2 
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(4.34) 

+ C-\l+n- 1 r n 2- M )2- K . 
Similarly, we have 

N(K) 

G 1 <(bh(9 n )-2cy 1 C(e,r ] )D(p) £ [||A?|| + ||/, 
(4.35) 

+ (bh(e n )-2c)- 1 C(6, V )2- K . 
For r n < n, we have n _1 r n 2 _A/ < 1. From (4.32), (4.34) and (4.35) we obtain 

Ei < (C h (bh(6 n ) - 2c))" 1 f -^J w (0 n )D(p) 

N(K) J 

x E E[ii^fii + n/fii]2 ( - 1/2) ^ +1 

i=l j=M 

(4.36) 

iV(i<T) 

+ (6h(^)-2c)- 1 C(0,7 ? )D(p) £ [||Af|| + ||/f ||]n^/ 2 

i=l 

+ {bh{e n )-2 C y 1 C{9,r ] )2- K . 
For £"2 we use a similar argument to obtain 

N{K) 

(4.37) E 2 <(cw(9 n ))- 1 £ D(p)[||Af|| + ||/f||]n-^ 2 + 2( CU ;(^))- 1 2- A '. 

i=i 

By taking b = b n and c = b n h(6 n )/4: and substituting (4.36) and (4.37) into 
(4.23), we have 

nr n \9 n -9 n \>2 M ) 

n 

(4.38) 

+ C(9, r ] )D(p)N(K)b- 1 n- p/2 sup 

+ Cforfib-^-x + P(|0 n - 6 n \ > V ) + P(N(S n ) < b n ). 

Choosing K such that b~ 1 2~ K ~ e n and r n such that N(K)b~ 1 rn 2 n~ p / 2 
1, we obtain 



< r/jD^fe- 1 -2 JV(iT) sup 11/11 ^ 2(- 1 /^> 



(4.39) lim P(r n |^-e n |>2 M )<C(^, ?? ) J D(p)sup||/|| V 2 (" 1 / 2 ^. 
Finally, letting M tend to infinity ends the proof. 
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Remark 4. If b n > b > 0, we choose r n = n. Firstly, let n go to in- 
finity, then let M go to infinity and finally let K go to infinity to obtain 

| > 2 M ) = 0, without posing any restriction 

on the covering numbers. 
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